NewsQA: A Machine Comprehension Dataset
نویسندگان
چکیده
We present NewsQA, a challenging machine comprehension dataset of over 100,000 question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting in spans of text from the corresponding articles. We collect this dataset through a fourstage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (25.3% F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at datasets.maluuba.com/NewsQA.
منابع مشابه
Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
We develop a technique for transfer learning in machine comprehension (MC) using a novel two-stage synthesis network (SynNet). Given a high-performing MC model in one domain, our technique aims to answer questions about documents in another domain, where we use no labeled data of question-answer pairs. Using the proposed SynNet with a pretrained model from the SQuAD dataset on the challenging N...
متن کاملDataset for the First Evaluation on Chinese Machine Reading Comprehension
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attentions. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two diff...
متن کاملMachine Comprehension Based on Learning to Rank
Machine comprehension plays an essential role in NLP and has been widely explored with dataset like MCTest. However, this dataset is too simple and too small for learning true reasoning abilities. (Hermann et al., 2015) therefore release a large scale news article dataset and propose a deep LSTM reader system for machine comprehension. However, the training process is expensive. We therefore tr...
متن کاملA Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis
BACKGROUND With the development of artificial intelligence (AI) technology centered on deep-learning, the computer has evolved to a point where it can read a given text and answer a question based on the context of the text. Such a specific task is known as the task of machine comprehension. Existing machine comprehension tasks mostly use datasets of general texts, such as news articles or elem...
متن کاملCliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension
We present a new dataset for machine comprehension in the medical domain. Our dataset uses clinical case reports with around 100,000 gap-filling queries about these cases. We apply several baselines and state-of-the-art neural readers to the dataset, and observe a considerable gap in performance (20% F1) between the best human and machine readers. We analyze the skills required for successful a...
متن کامل